We are analysing a dataset from NASA’s Goddard Institute for Space Studies to study the effects of climate change in the Northern Hemisphere. Glimpsing at the data, we see there are 19 variables and 143 observations, representing the period between 1880-2022:
weather <-
read_csv("https://data.giss.nasa.gov/gistemp/tabledata_v4/NH.Ts+dSST.csv",
skip = 1,
na = "***")
glimpse(weather)## Rows: 143
## Columns: 19
## $ Year <dbl> 1880, 1881, 1882, 1883, 1884, 1885, 1886, 1887, 1888, 1889, 1890…
## $ Jan <dbl> -0.39, -0.30, 0.26, -0.58, -0.16, -1.01, -0.75, -1.08, -0.49, -0…
## $ Feb <dbl> -0.53, -0.24, 0.21, -0.66, -0.11, -0.45, -0.84, -0.71, -0.61, 0.…
## $ Mar <dbl> -0.23, -0.05, 0.02, -0.15, -0.64, -0.23, -0.71, -0.44, -0.64, -0…
## $ Apr <dbl> -0.30, -0.02, -0.30, -0.30, -0.59, -0.49, -0.37, -0.38, -0.22, 0…
## $ May <dbl> -0.05, 0.05, -0.23, -0.25, -0.36, -0.58, -0.34, -0.25, -0.15, -0…
## $ Jun <dbl> -0.18, -0.33, -0.28, -0.11, -0.41, -0.45, -0.37, -0.20, -0.03, -…
## $ Jul <dbl> -0.21, 0.10, -0.28, -0.05, -0.41, -0.34, -0.14, -0.24, 0.00, -0.…
## $ Aug <dbl> -0.25, -0.04, -0.14, -0.22, -0.51, -0.41, -0.43, -0.54, -0.21, -…
## $ Sep <dbl> -0.24, -0.28, -0.24, -0.34, -0.45, -0.40, -0.33, -0.21, -0.20, -…
## $ Oct <dbl> -0.30, -0.44, -0.51, -0.16, -0.44, -0.37, -0.31, -0.49, -0.03, -…
## $ Nov <dbl> -0.43, -0.36, -0.33, -0.44, -0.57, -0.38, -0.40, -0.27, -0.01, -…
## $ Dec <dbl> -0.42, -0.23, -0.68, -0.15, -0.47, -0.11, -0.22, -0.43, -0.24, -…
## $ `J-D` <dbl> -0.29, -0.18, -0.21, -0.29, -0.43, -0.43, -0.43, -0.44, -0.24, -…
## $ `D-N` <dbl> NA, -0.19, -0.17, -0.33, -0.40, -0.46, -0.42, -0.42, -0.25, -0.1…
## $ DJF <dbl> NA, -0.32, 0.08, -0.64, -0.14, -0.64, -0.57, -0.67, -0.51, -0.08…
## $ MAM <dbl> -0.19, -0.01, -0.17, -0.24, -0.53, -0.43, -0.47, -0.36, -0.33, 0…
## $ JJA <dbl> -0.21, -0.09, -0.23, -0.13, -0.44, -0.40, -0.31, -0.33, -0.08, -…
## $ SON <dbl> -0.32, -0.36, -0.36, -0.31, -0.49, -0.38, -0.35, -0.32, -0.08, -…
For the purpose of our analysis, we have decided to select only data pertaining to temperature deviation (delta) by month, and manipulate the dataframe to facilitate further investigation:
tidyweather <- select(weather, 1:13) %>%
pivot_longer(!Year, names_to = 'month', values_to = 'delta')
head(tidyweather)## # A tibble: 6 × 3
## Year month delta
## <dbl> <chr> <dbl>
## 1 1880 Jan -0.39
## 2 1880 Feb -0.53
## 3 1880 Mar -0.23
## 4 1880 Apr -0.3
## 5 1880 May -0.05
## 6 1880 Jun -0.18
First, we are plotting a scatter plot to visualize the evolution of delta (temperature deviation) over time:
tidyweather <- tidyweather %>%
mutate(date = ymd(paste(as.character(Year), month, "1")),
month = month(date, label=TRUE),
year = year(date))
gg <- ggplot(tidyweather, aes(x=date, y = delta))+
geom_point()+
geom_smooth(color="red") +
theme_bw() +
labs (
title = "Weather Anomalies"
)
#plot_gg(gg,multicore=TRUE,width=5,height=5,scale=250)Adding a line of best fit to the scatterplot clearly shows that, while deltas were close to 0 between approximately 1935-1975 and negative before that time, temperature in the Northern Hempishere has been quickly increasing since then. Notice that the rate of the increase has been increasing as well (as signified by increasing deltas).
Next, we will use facet_wrap() to visualize the effects
of increasing temperature by month:
We can see that the effect of rising temperature in the Northern Hemisphere is common to all months of the year.
As a means to further investigate the effects of climate change, we
will partition the data into time periods, particularly decades. To that
end, we will use case_when():
comparison <- tidyweather %>%
filter(Year>= 1881) %>% #remove years prior to 1881
#create new variable 'interval', and assign values based on criteria below:
mutate(interval = case_when(
Year %in% c(1881:1920) ~ "1881-1920",
Year %in% c(1921:1950) ~ "1921-1950",
Year %in% c(1951:1980) ~ "1951-1980",
Year %in% c(1981:2010) ~ "1981-2010",
TRUE ~ "2011-present"
))In order to study the effects of climate change by decade, we will produce a density plot to investigate the distribution of monthly temperature deviations by the time periods selected above:
ggplot(comparison) +
aes(x = delta, fill = interval)+
#facet_wrap(~month)+
geom_density(alpha = 0.2) The plot clearly shows that with the passage of time, deltas have consistently moved to the right hand side of the plot. In other words, temperature deviations have been increasing over time.
Lastly, we will also consider annual anomalies by grouping the monthly data and producing a scatterplot:
#creating yearly averages
average_annual_anomaly <- tidyweather %>%
group_by(Year) %>% #grouping data by Year
# creating summaries for mean delta
# use `na.rm=TRUE` to eliminate NA (not available) values
summarise(yearly_mean = mean(delta, na.rm=TRUE))
average_annual_anomaly## # A tibble: 143 × 2
## Year yearly_mean
## <dbl> <dbl>
## 1 1880 -0.294
## 2 1881 -0.178
## 3 1882 -0.208
## 4 1883 -0.284
## 5 1884 -0.427
## 6 1885 -0.435
## 7 1886 -0.434
## 8 1887 -0.437
## 9 1888 -0.236
## 10 1889 -0.177
## # … with 133 more rows
#plotting the data
#Fit the best fit line, using LOESS method
ggplot(average_annual_anomaly) +
aes(x = Year, y = yearly_mean)+
geom_point()+
geom_smooth(method = 'lm') +
theme_bw()The plot proves that annual temprature deltas have been increasing over time.
deltaWe will now focus on the time period between 2011-present. We divert our attention towards producing a confidence interval for the average annual deltas calculated since 2011. We will use both the statistical method and bootstrap simulation to have more confidence in the results:
#statistical method
formula_ci <- comparison %>%
filter(interval == '2011-present') %>%
group_by(year) %>%
summarise(avg_annual_delta = mean(delta),
sd_delta = sd(delta),
count = n(),
SE = sd(delta/sqrt(count)),
upper_ci = avg_annual_delta + 2*SE,
lower_ci = avg_annual_delta - 2*SE)
#print out formula_CI
formula_ci## # A tibble: 12 × 7
## year avg_annual_delta sd_delta count SE upper_ci lower_ci
## <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 2011 0.745 0.113 12 0.0327 0.810 0.680
## 2 2012 0.815 0.179 12 0.0517 0.918 0.712
## 3 2013 0.8 0.118 12 0.0340 0.868 0.732
## 4 2014 0.92 0.145 12 0.0420 1.00 0.836
## 5 2015 1.18 0.178 12 0.0515 1.28 1.07
## 6 2016 1.31 0.333 12 0.0961 1.50 1.12
## 7 2017 1.18 0.226 12 0.0653 1.31 1.05
## 8 2018 1.04 0.137 12 0.0396 1.12 0.957
## 9 2019 1.21 0.153 12 0.0441 1.30 1.12
## 10 2020 1.35 0.225 12 0.0648 1.48 1.22
## 11 2021 1.14 0.117 12 0.0339 1.20 1.07
## 12 2022 NA NA 12 NA NA NA
#bootstrap simulation
set.seed(1234)
bootstrap <- comparison %>%
filter(interval == '2011-present') %>%
specify(response = delta) %>%
generate(type = 'bootstrap') %>%
calculate(stat = 'mean')
bootstrap_ci <- bootstrap %>% get_confidence_interval(level = 0.95)
#print out confidence intervals
bootstrap_ci## # A tibble: 1 × 2
## lower_ci upper_ci
## <dbl> <dbl>
## 1 1.09 1.09
#visualise
visualise(bootstrap)library(gapminder)
library(gganimate)
library(png)
library(gifski)
#skimr::skim(gapminder)
# Use the gapminder dataset in ggplot
ggplot(data=gapminder,
aes(x=gdpPercap, y=lifeExp, size=pop, color=country)) +
# Add a point geom
geom_point(alpha=0.7, show.legend=FALSE) +
# Add some manual scaling and facets
scale_colour_manual(values=country_colors) +
scale_size(range=c(2, 12)) +
scale_x_log10() +
facet_wrap(~continent, nrow=1) +
# Animate figure with gganimate package
transition_time(year) +
ease_aes('linear') +
labs(title='Year: {frame_time}',
x='GDP per capita',
y='Life expectancy')Looking at the results of the analysis, we can see that the statistical method produces wider confidence intervals for temperature deltas, ranging from 0.13 to approximately 0.3 in width. This is probably due to the low number of observations (12 months in each year), which prohibit a more precise calculation. On the other hand, using bootstrap simulation allows to get a much narrower confidence interval. However, both methods show that temperature deltas have been positive in the time period in question and have been consistently greater than 1 since 2015.